Caio Raphael

Sergiy Migdalskiy - Performance Optimization, SIMD and Cache .
- "A rehash of Sergiy Migdalskiy GDC 2015 talk: Performance Optimization for Physics".
- He's from Valve; worked on Left4dead.
- Excellent video!! Great explanation.
- It's talked about minimizing branches, a technique for a different approach to use pointers as a way to store them in disk, SOA, SIMD.
- {36:10} SIMD.
Data should be independent from each other, just as any parallel operation would want.
Handmade Hero#Ep 115 - SIMD Basics .
Odin.md#SIMD .

Vector width: Variable.
- Register width is not fixed (128–2048 bits in 128-bit steps).
- Code is vector-length agnostic (designed to scale across cores).
Typical lane count: 8 lanes × 32-bit (8 × float32 )

Is an open, modular instruction set architecture (ISA) based on the RISC (Reduced Instruction Set Computer) design principles.
Unlike proprietary ISAs (e.g., x86 by Intel/AMD, ARM by Arm Ltd.), RISC-V is:
- Open source — Anyone can use or implement it without licensing fees.
- Modular — It has a minimal base instruction set, with optional extensions (e.g., floating-point, SIMD, vector).

Similar to ARM SVE, RVV allows hardware to define vector width.
Not fixed to 128, 256, or 512 bits—code adapts dynamically.
Scalable width: Vector registers can be from 128 to 2048 bits, depending on hardware.
Vector-Length Agnostic (VLA):
- Programs don’t assume a fixed vector width.
- Code adapts to hardware at runtime — same binary works on 128-bit or 512-bit hardware.

GPUs use SIMT (Single Instruction, Multiple Thread), not SIMD per se, but functionally similar at scale.